Skip to content

Conversation

@ggerganov
Copy link
Member

@ggerganov ggerganov commented Jun 1, 2025

outdated

fix #12433 (comment)

I suspect the reference configs for Gemma 27B v2 and v3 are borked:

It does not make sense to normalize the Q tensor with hidden_size / num_heads. It should be normalized with head_size, like all other models.

This change improves PPL and fixes the catastrophic generation at large contexts (see #12433 (comment))

@ggerganov ggerganov marked this pull request as draft June 1, 2025 19:13
@ggerganov ggerganov changed the title gemma : fix attn scale for 27B gemma : more consistent attention scaling for v2 and v3 Jun 2, 2025
@ggerganov ggerganov marked this pull request as ready for review June 2, 2025 15:58
@ggerganov ggerganov merged commit 5582c49 into master Jun 2, 2025
46 checks passed
@ggerganov ggerganov deleted the gg/gemma-fix-attn-scale branch June 2, 2025 17:54
furyhawk pushed a commit to furyhawk/llama.cpp that referenced this pull request Jun 6, 2025
* gemma : fix attn scale for 27B

* cont : apply scale before attn

* cont : consistent attention scaling
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eval bug: Gemma3 <unused32> spam

2 participants